Concepedia

Concept

Speech Recognition

Variants

Automatic Speech Recognition

Parents

Children

72.1K

Publications

3.9M

Citations

110.6K

Authors

9.9K

Institutions

Dynamic Time Warping Alignment

1956 - 1985

The period centers on time alignment and similarity measures, with Dynamic Time Warping (DTW) enabling time-normalized matching and dynamic programming–driven time-warping guiding word-level alignment across utterances. Front-end representations grounded in Linear Predictive Coding (LPC) and cepstral analysis, together with formant and pitch estimation, yield compact, trainable features that support predictive coding and excitation modeling. Vector quantization and early statistical pattern recognition shape the ASR pipeline, while Hidden Markov Models (HMMs) begin to emerge for speaker-independent isolated-word recognition, shaping model-based approaches. Acoustic cues such as spectral formants, cepstral pitch, and voicing inform feature extraction and decision rules.

Time alignment and similarity measures became the central paradigm for speech recognition, with Dynamic Time Warping (Dynamic Time Warping, DTW) enabling time-normalized matching and dynamic programming-based time-warping guiding word-level alignment across utterances. [9], [16], [7], [12], [10].

Front-end representations grounded in linear prediction, cepstral analysis, and formant/pitch estimation provided compact, trainable features for recognition, enabling predictive coding and excitation modeling via Linear Predictive Coding (LPC) and cepstrum-based methods. [1], [19], [4], [5], [3].

Vector quantization and statistical pattern recognition shaped early automatic speech recognition pipelines, with Vector Quantization (VQ) design, LPC-front ends, and emerging integration of Hidden Markov Models for speaker-independent isolated word recognition. [6], [17], [20], [8].

Acoustic-phonetic cue research established spectral formants, cepstral pitch, and voicing cues as foundational signals for speech perception and recognition, informing feature extraction and decision rules. [5], [4], [15], [2].

Time-Delay Neural Network Era

1986 - 2001

Neural Sequence Modeling Emergence

2002 - 2008

Deep Neural Acoustic Modeling

2009 - 2015

Self-Supervised End-to-End Speech

2016 - 2024